

### COMPUTER ORGANIZATION AND DESIGN

The Hardware/Software Interface

# Topic 12

# **Memory Hierarchy**

- Cache (2)

# **MIPS Pipeline Architecture**



# **Block Size Considerations**

- Larger blocks should reduce miss rate
  - Due to spatial locality
- But in a fixed-sized cache
  - Larger blocks ⇒ fewer of them
    - More competition ⇒ increased miss rate
- Larger miss penalty
  - Primarily result of longer time to fetch block
    - Latency to first word
    - Transfer time for the rest of the block
  - Can override benefit of reduced miss rate
  - Early restart and critical-word-first can help



# Cache Example 2

- 4-block cache
- 2 words per block
- direct mapped
- Assuming 7-bit byte addresses





| lw R1 ←               | mem[2          | 22]      |                |
|-----------------------|----------------|----------|----------------|
| Requested<br>mem addr | word<br>addr   | Hit/miss | Cache<br>block |
| 10110 00              | 10 <b>11</b> 0 | Miss     | 11             |

| Word Addr  | Data       |
|------------|------------|
| 00000 (0)  | 0x81230431 |
| 00001 (1)  | 0xABCD3305 |
| 00010 (2)  | 0xFFFF0001 |
| 00011 (3)  | 0xFFFF0002 |
|            |            |
| 10110 (22) | 0x05AC0011 |
|            |            |

| Index | V | Tag | Word 0  | Word 1  |
|-------|---|-----|---------|---------|
| 00    | Ν |     |         |         |
| 01    | Ν |     |         |         |
| 10    | N |     |         |         |
| 11    | Υ | 10  | Mem[22] | Mem[23] |
|       |   |     |         | _       |





| sw \$0 → mem[23]<br>lw R2 ← mem[27] |              |                  |                |
|-------------------------------------|--------------|------------------|----------------|
| Requested<br>mem addr               | word<br>addr | Hit/miss         | Cache<br>block |
| 10111 00                            | 10 11 1      | Hit <sub>K</sub> | 11             |
| 11011 00                            | 11 01 1      | Miss             | 01             |

Hit due to spatial locality

| Index | V | Tag | Word 0  | Word 1           |
|-------|---|-----|---------|------------------|
| 00    | Z |     |         |                  |
| 01    | Y | 11  | Mem[26] | Mem[27] <b>4</b> |
| 10    | Ν |     |         |                  |
| 11    | Υ | 10  | Mem[22] | 0                |

| Word Addr  | Data       |
|------------|------------|
| 00000 (0)  | 0x81230431 |
| 00001 (1)  | 0xABCD3305 |
| 00010 (2)  | 0xFFFF0001 |
| 00011 (3)  | 0xFFFF0002 |
|            |            |
| 10110 (22) | 0x05AC0011 |
| 10111 (23) | 0x0000FFEE |
|            |            |
| 11010 (26) | 0x1153A4D6 |
| 11011 (27) | 0xAABB1234 |
|            |            |
| 11110 (30) | 0x00000000 |
| 11111 (31) | 0x00000000 |



| lw R3 ← mem[6]        |              |          |                |  |
|-----------------------|--------------|----------|----------------|--|
| Requested<br>mem addr | Word<br>addr | Hit/miss | Cache<br>block |  |
| 00110 00              | 00 11 0      | miss     | 11             |  |

| Index | V | Tag | Word 0  | Word 1  |
|-------|---|-----|---------|---------|
| 00    | Z |     |         |         |
| 01    | Y | 11  | Mem[26] | Mem[27] |
| 10    | Ν |     |         |         |
| 11    | Υ | 10  | Mem[6]  | Mem[7]  |

N-to-1 mapping causes competition, original block was replaced

|   | Word Addr  | Data       |
|---|------------|------------|
|   | 00000 (0)  | 0x81230431 |
|   |            |            |
| _ | 00110 (6)  | 0xFFFF0126 |
|   | 00111 (7)  | 0xFFFF0127 |
|   |            | •••        |
|   | 10110 (22) | 0x05AC0011 |
|   | 10111 (23) | 0x0000FFEE |
|   |            | •••        |
|   | 11010 (26) | 0x1153A4D6 |
|   | 11011 (27) | 0xAABB1234 |
|   |            | •••        |
|   | 11110 (30) | 0x00000000 |
|   | 11111 (31) | 0x00000000 |

| lw R4 ← mem [22]      |              |          |                |  |
|-----------------------|--------------|----------|----------------|--|
| Requested<br>mem addr | Word<br>addr | Hit/miss | Cache<br>block |  |
| 10110 00              | 10 11 0      | miss     | 11             |  |

| V | Tag | Word 0         | Word 1  |
|---|-----|----------------|---------|
| N |     |                |         |
| Υ | 11  | Mem[26]        | Mem[27] |
| N |     |                |         |
| Υ | 10  | Mem[22]        | Mem[23] |
|   | Υ   | N<br>Y 11<br>N | N       |

Replaced again

| Word Addr  | Data       |
|------------|------------|
| 00000 (0)  | 0x81230431 |
|            | •••        |
| 00110 (6)  | 0xFFFF0126 |
| 00111 (7)  | 0xFFFF0127 |
|            |            |
| 10110 (22) | 0x05AC0011 |
| 10111 (23) | 0x0000FFEE |
|            |            |
| 11010 (26) | 0x1153A4D6 |
| 11011 (27) | 0xAABB1234 |
|            | •••        |
| 11110 (30) | 0x00000000 |
| 11111 (31) | 0x00000000 |



# **Example 3: Larger Block Size**

- 64 blocks, 4 words/block
  - What cache block number does byte address 1200 map to?
  - Word number = 1200/4 = 300
  - Block (address) number = 300/4 = 75
  - Block index in cache = 75 modulo 64 = 11

| 31 |         | 10 9 | 4      | 3 2            | 1 0            |
|----|---------|------|--------|----------------|----------------|
|    | Tag     | I    | ndex   | Word<br>Offset | Byte<br>Offset |
|    | 22 bits | -    | 6 bits | 2 bits         | 2 bits         |



# **Cache Size in Bits**

- Given
  - 32-bit byte address
  - 2<sup>n</sup> blocks in cache
  - 2<sup>m</sup> words per block, 2<sup>m+2</sup> bytes
- Size of tag field = 32 (n + m + 2)
  - n bits to index blocks in cache
  - m bits used to select words in a block
  - 2 bits used to select the 4 bytes in a word
  - Tag field decreases when n and m increase
- Cache size = 2<sup>n</sup> × (block size + tag size + valid field size)



# **Class Exercise**

- Given
  - 2K blocks in cache
  - 8 words in each block
  - 32-bit byte address 0x810023FE requested by CPU
- Show address of the target cache block and organization of the entire cache



# Miss in Instruction Cache

- 1. Send original PC (PC 4) to memory
  - From the Adder
- 2. Read main (lower level) memory and wait for data
- 3. Write data into cache data field from main memory, write upper bits of address into cache tag field, set valid field
- 4. Restart the missing instruction



# Miss in Data Cache – Reads

- 1. Hold the pipeline
- 2. Read main (lower level) memory and wait for data
- 3. Write data into cache data field from main memory, write upper bits of address into cache tag field, set valid field
- 4. Read cache again, proceed



# **Handling Data Writes – Write Through**

- On data-write (e.g. sw) hit, could just update the block in cache
  - But then cache and memory would be inconsistent
- Write through: also update the word in memory



# Write Through Example Word Addr 0

| Requested mem addr | Word addr | Hit/miss | Cache block |
|--------------------|-----------|----------|-------------|
| 00101 00           | 001 0 1   | hit      | 0           |

| sw R1→mem[5] |    |     |
|--------------|----|-----|
|              | R0 | 20  |
|              | R1 | 23  |
|              | R2 | 36  |
|              | R3 | 15  |
|              | R4 | 87  |
|              | R5 | 62  |
|              | R6 | 99  |
|              | R7 | 178 |
|              |    |     |

| Index | V | Tag | Data   | 7  |  |
|-------|---|-----|--------|----|--|
| 0     | Y | 001 | 140    | 8  |  |
| ,     |   |     | 141→23 | 9  |  |
| 1     | N |     |        | 10 |  |
|       |   |     |        | 11 |  |
|       |   |     |        | 12 |  |
|       |   |     |        | 13 |  |
|       |   |     |        | 14 |  |

**Data** 

141<del>→</del>23



## **Word Addr** Write Through Example

| Requested mem addr | Word addr | Hit/miss | Cache block |
|--------------------|-----------|----------|-------------|
| 00100 00           | 001 0 0   | hit      | 0           |

|    | R1 → mem [5] |    |     |
|----|--------------|----|-----|
| SW | R2→mem[4]    | R0 | 20  |
|    |              | R1 | 23  |
|    |              | R2 | 36  |
|    |              | R3 | 15  |
|    |              | R4 | 87  |
|    |              | R5 | 62  |
|    |              | R6 | 99  |
|    |              | R7 | 178 |
|    |              |    |     |

**CPU** 

|       |   |     |                     | 6  | 615 |
|-------|---|-----|---------------------|----|-----|
| Index | V | Tag | Data                | 7  | 712 |
| 0     | Y | 004 | 140 <del>→</del> 36 | 8  | 3   |
| ī     |   |     | 23                  | 9  | 300 |
| 1     | N |     |                     | 10 | 531 |
|       |   |     |                     | 11 | 153 |
|       |   |     |                     | 12 | 234 |
|       |   |     |                     | 13 | 912 |
|       |   |     |                     | 14 | 0   |

**Data** 

110

120

133

233

140<del>→</del>36

23

10



# Handling Data Writes – Write Through

- But makes writes take longer time
  - Must wait till the update finishes
  - e.g., if base CPI = 1, 10% of instructions are stores, write to memory takes 100 cycles
    - Effective CPI = base CPI + write time (cycles) per instruction
    - Effective CPI =  $1 + 0.1 \times 100 = 11$
- Even worse for write miss
  - Detect a miss on target address
  - Fetch the block from main memory to cache
  - Overwrite the word in cache
  - Write the block back to main memory



# **Write Through Example**

| Requested mem addr | Word addr | Hit/miss | Cache block |
|--------------------|-----------|----------|-------------|
| 01010 00           | 010 1 0   | miss     | 1           |

| sw R1→mem[5] sw R2→mem[4] Sw R4→mem[10] | R0<br>R1<br>R2<br>R3<br>R4<br>R5<br>R6 | <br>20<br>23<br>36<br>15<br>87<br>62<br>99 |
|-----------------------------------------|----------------------------------------|--------------------------------------------|
|                                         | R7                                     | 178                                        |

**CPU** 

| Index | V | Tag | Data |
|-------|---|-----|------|
| 0     | Υ | 001 | 36   |
|       |   |     | 23   |
| 1     | N |     |      |
| ,     |   |     |      |

**Word Addr** 

**Data** 

Miss

# Write Through Example

| Requested mem addr | Word addr | Hit/miss | Cache block |
|--------------------|-----------|----------|-------------|
| 01010 00           | 010 1 0   | miss     | 1           |

| sw R1→mem[5] sw R2→mem[4] Sw R4→mem[10] | R0<br>R1<br>R2<br>R3<br>R4<br>R5<br>R6<br>R7 | <br>20<br>23<br>36<br>15<br>87<br>62<br>99<br>178 |
|-----------------------------------------|----------------------------------------------|---------------------------------------------------|
|                                         |                                              |                                                   |

|       |             |     |      | 6  |  |  |  |  |
|-------|-------------|-----|------|----|--|--|--|--|
| Index | V           | Tag | Data | 7  |  |  |  |  |
| 0     | Υ           | 001 | 36   | 8  |  |  |  |  |
|       |             |     | 23   | 9  |  |  |  |  |
| 1     | Υ           | 010 | 531  | 10 |  |  |  |  |
| l     |             |     | 135  | 10 |  |  |  |  |
|       |             |     | 100  | 71 |  |  |  |  |
|       |             |     |      | 12 |  |  |  |  |
| F     | Fetch block |     |      |    |  |  |  |  |

**Word Addr** 

**Data** 



# Write Through Example Word Addr

| Requested mem addr | Word addr | Hit/miss | Cache block |
|--------------------|-----------|----------|-------------|
| 01010 00           | 010 1 0   | hit      | 1           |

| R1→mem[5] R2→mem[4] <b>R4→mem[10]</b> | R0<br>R1<br>R2<br>R3 | 20<br>23<br>36<br>15 |  |
|---------------------------------------|----------------------|----------------------|--|
|                                       | R4<br>R5             | 87 <b>-</b><br>62    |  |
|                                       | R6                   | 99                   |  |
|                                       | R7                   | 178                  |  |
|                                       |                      |                      |  |

| Index         | V | Tag | Data                        |  |  |
|---------------|---|-----|-----------------------------|--|--|
| 0             | Υ | 001 | 36                          |  |  |
|               |   |     | 23                          |  |  |
| 1             | Y | 010 | <b>5</b> 31 <del>→</del> 87 |  |  |
|               |   |     | 135                         |  |  |
| Write Through |   |     |                             |  |  |

**Data** 

**-**531<del>→</del>87

## **Write Buffer**

- Solution to time consuming write through technique (for both hit and miss)
  - Buffer stores data to be written to memory
    - May have one or more entries
  - CPU proceeds to next step, while letting buffer to complete write through
  - Frees buffer when completing write to memory
  - CPU stalls if buffer is full



# Write Through with Buffer 0

| Requested mem addr | Word addr | Hit/miss | Cache block |
|--------------------|-----------|----------|-------------|
| 01010 00           | 010 1 0   | hit      | 1           |

| SW | R1→mem[5]    |    |     |
|----|--------------|----|-----|
|    | R2 → mem [4] | R0 | 20  |
| SW | R4→mem[10]   | R1 | 23  |
|    |              | R2 | 36  |
|    |              | R3 | 15  |
|    |              | R4 | 87  |
|    |              | R5 | 62  |
|    |              | R6 | 99  |
|    |              | R7 | 178 |
|    |              |    |     |

| Index | V      | Tag | Data                |  |  |  |
|-------|--------|-----|---------------------|--|--|--|
| 0     | Υ      | 001 | 36                  |  |  |  |
|       |        |     | 23                  |  |  |  |
| 1     | Υ      | 010 | 531 <del>→</del> 87 |  |  |  |
|       |        |     | 135                 |  |  |  |
|       |        |     |                     |  |  |  |
|       | 87     |     |                     |  |  |  |
|       | Buffer |     |                     |  |  |  |

**Data** 

10 \\_531 →87

# **Handling Data Writes – Write Back**

- Alternative of write through: On data-write hit, just update the block in cache
  - CPU keeps track of whether each block is dirty (updated with new values)
- Write a block back to memory
  - Only when a dirty block has to be replaced (on miss)
  - More complex than write through



| Requested mem addr | Word addr | Hit/miss | Cache block |
|--------------------|-----------|----------|-------------|
| 00101 00           | 001 0 1   | hit      | 0           |
| 01011 00           | 010 1 1   | hit      | 1           |

| lw       | R3←mem[5]               |    |     |
|----------|-------------------------|----|-----|
|          | R7←mem[11]              | R0 | 20  |
| SW       | R6→mem[11]              | R1 | 23  |
| SW<br>SW | R5→mem[10]<br>R7→mem[6] | R2 | 36  |
| SW       | R0→mem[13]              | R3 | 23  |
|          |                         | R4 | 87  |
|          |                         | R5 | 62  |
|          |                         | R6 | 99  |
|          |                         | R7 | 135 |
|          |                         |    |     |

| Indx | V | D | Tag | Data        |
|------|---|---|-----|-------------|
| 0    | Υ | 0 | 001 | 36          |
|      |   |   |     | <b>2</b> 3  |
| 1    | Υ | 0 | 010 | 87          |
|      |   |   |     | <b>1</b> 35 |
|      |   |   |     |             |

**Word Addr** 

**Data** 

CPU

| Requested mem addr | Word addr | Hit/miss | Cache block |
|--------------------|-----------|----------|-------------|
| 01011 00           | 010 1 1   | hit      | 1           |

| lw       | R3←mem[5]               |    |     |
|----------|-------------------------|----|-----|
| lw       | R7←mem[11]              | R0 | 20  |
| SW       | R6→mem[11]              | R1 | 23  |
| SW<br>SW | R5→mem[10]<br>R7→mem[6] | R2 | 36  |
| SW       | R0→mem[13]              | R3 | 23  |
|          |                         | R4 | 87  |
|          |                         | R5 | 62  |
|          |                         | R6 | 99  |
|          |                         | R7 | 135 |
|          |                         |    |     |

| Indx | V | D | Tag | Data   |
|------|---|---|-----|--------|
| 0    | Υ | 0 | 001 | 36     |
|      |   |   |     | 23     |
| 1    | Y | 1 | 010 | 87     |
|      |   |   |     | 135→99 |

Write cache, Dirty

**Data** 

**Word Addr** 



| Requested mem addr | Word addr | Hit/miss | Cache block |
|--------------------|-----------|----------|-------------|
| 01010 00           | 010 1 0   | hit      | 1           |

| lw | R3←mem[5]                |    |     |
|----|--------------------------|----|-----|
| lw | R7←mem[11]               | R0 | 20  |
| SW | R6→mem[11]               | R1 | 23  |
| SW | $R5 \rightarrow mem[10]$ |    |     |
| SW | R7→mem[6]                | R2 | 36  |
| SW | R0→mem[13]               | R3 | 23  |
|    |                          | R4 | 87  |
|    |                          | R5 | 62  |
|    |                          | R6 | 99  |
|    |                          | R7 | 135 |
|    |                          |    |     |

| Indx | V | D | Tag | Data               |
|------|---|---|-----|--------------------|
| 0    | Y | 0 | 001 | 36                 |
|      |   |   |     | 23                 |
| 1    | Υ | 1 | 010 | 87 <del>→</del> 62 |
|      |   |   |     | 99                 |

Write cache, Dirty

Data

110

**Word Addr** 

120 133

233

36

615

712

6

11

3

300

10 87

135

12 234

13 912

14 0



| Requested mem addr | Word addr | Hit/miss | Cache block |
|--------------------|-----------|----------|-------------|
| 00110 00           | 00110     | miss     | 1           |

| lw       | R3←mem[5]                |    |     |
|----------|--------------------------|----|-----|
| lw       | R7←mem[11]               | R0 | 20  |
| SW<br>SW | R6→mem[11]<br>R5→mem[10] | R1 | 23  |
| SW       | R7→mem[6]                | R2 | 36  |
| SW       | R0→mem[13]               | R3 | 23  |
|          |                          | R4 | 87  |
|          |                          | R5 | 62  |
|          |                          | R6 | 99  |
|          |                          | R7 | 135 |
|          |                          |    |     |

# Indx V D Tag Data 0 Y 0 001 36 23 1 Y 1 010 62 99

Miss match

## Miss

| Word | Word Addr |    | Data |  |
|------|-----------|----|------|--|
|      |           | 0  | 110  |  |
|      | _         | 1  | 120  |  |
|      | ne block  |    | 133  |  |
| 1    |           |    | 233  |  |
|      |           | 4  | 36   |  |
|      |           | 5  | 23   |  |
|      |           | 6  | 615  |  |
| ata  | Ī         | 7  | 712  |  |
| 66   |           | 8  | 3    |  |
| 23   |           | 9  | 300  |  |
| 52   |           | 10 | 87   |  |
| 9    |           | 11 | 135  |  |
|      |           | 12 | 234  |  |
|      |           | 13 | 912  |  |
|      |           | 14 | 0    |  |
|      |           | 15 | 10   |  |



| Requested mem addr | Word addr | Hit/miss | Cache block |
|--------------------|-----------|----------|-------------|
| 00110 00           | 001 1 0   | miss     | 1           |

| lw       | R3←mem[5]            |    |     |
|----------|----------------------|----|-----|
| lw       | R7←mem[11]           | R0 | 20  |
| SW       | R6→mem[11]           | R1 | 23  |
| SW<br>SW | R5→mem[10] R7→mem[6] | R2 | 36  |
| SW       | R0→mem[13]           | R3 | 23  |
|          |                      | R4 | 87  |
|          |                      | R5 | 62  |
|          |                      | R6 | 99  |
|          |                      | R7 | 135 |
|          |                      |    |     |



**Word Addr** 

Data



| Requested mem addr | Word addr | Hit/miss | Cache block |
|--------------------|-----------|----------|-------------|
| 00110 00           | 001 1 0   | miss     | 1           |

| lw | R3←mem[5]                 |       |     |
|----|---------------------------|-------|-----|
| lw | R7←mem[11]                | R0    | 20  |
| SW | R6→mem[11]                | R1    | 23  |
| SW | $R5 \rightarrow mem [10]$ | 1 X 1 | 20  |
| sw | R7→mem[6]                 | R2    | 36  |
| SW | R0→mem[13]                | R3    | 23  |
|    |                           | R4    | 87  |
|    |                           | R5    | 62  |
|    |                           | R6    | 99  |
|    |                           | R7    | 135 |
|    |                           |       |     |

|                   |     |   |     |      | 6  |    |
|-------------------|-----|---|-----|------|----|----|
| Indx              | V   | D | Tag | Data | 7  |    |
| 0                 | Y   | 0 | 001 | 36   | 8  |    |
|                   |     |   |     | 23   | 9  |    |
| 1                 | Υ   | 1 | 010 | 62   |    | O. |
|                   |     |   |     | 00   | 10 | 0  |
|                   |     |   |     | 99   | 11 | 13 |
|                   | _/_ |   |     |      | 12 |    |
| D                 | irt | y |     |      | 13 |    |
| Write back first! |     |   |     |      | 14 |    |
|                   |     |   |     |      | 15 |    |

**Word Addr** 

**Data** 

**7→62** 

35→99



| Requested mem addr | Word addr | Hit/miss | Cache block |
|--------------------|-----------|----------|-------------|
| 00110 00           | 001 1 0   | miss     | 1           |

| lw | R3 <b>←</b> mem[5]       |       |     |
|----|--------------------------|-------|-----|
| lw | R7←mem[11]               | R0    | 20  |
| SW | R6→mem[11]               | R1    | 23  |
| SW | $R5 \rightarrow mem[10]$ | 1 \ 1 | 20  |
| sw | R7→mem[6]                | R2    | 36  |
| SW | R0→mem[13]               | R3    | 23  |
|    |                          | R4    | 87  |
|    |                          | R5    | 62  |
|    |                          | R6    | 99  |
|    |                          | R7    | 135 |
|    |                          |       |     |



**Word Addr** 

Data



| Requested mem addr | Word addr | Hit/miss | Cache block |
|--------------------|-----------|----------|-------------|
| 00110 00           | 001 1 0   | hit      | 1           |

| lw | R3 <b>←</b> mem[5]        |       |     |
|----|---------------------------|-------|-----|
| lw | R7←mem[11]                | R0    | 20  |
| SW | R6→mem[11]                | R1    | 23  |
| SW | $R5 \rightarrow mem [10]$ | 1 × 1 |     |
| sw | R7→mem[6]                 | R2    | 36  |
| SW | R0→mem[13]                | R3    | 23  |
|    |                           | R4    | 87  |
|    |                           | R5    | 62  |
|    |                           | R6    | 99  |
|    |                           | R7    | 135 |
|    |                           |       |     |



Data

**Word Addr** 

**CPU** 

| Requested mem addr | Wo <mark>rd addr</mark> | Hit/miss | Cache block |
|--------------------|-------------------------|----------|-------------|
| 01101 00           | 01101                   | miss     | 0           |

| lw       | R3←mem[5]                |    |     |
|----------|--------------------------|----|-----|
| lw       | R7←mem[11]               | R0 | 20  |
| SW       | R6→mem[11]<br>R5→mem[10] | R1 | 23  |
| SW<br>SW | R7→mem[6]                | R2 | 36  |
| sw       | R0→mem[13]               | R3 | 23  |
|          |                          | R4 | 87  |
|          |                          | R5 | 62  |
|          |                          | R6 | 99  |
|          |                          | R7 | 135 |
|          |                          |    |     |

#### Miss match Indx V Tag **Data** 001 36 23 001 135

## Miss

| Word Addr |                                       | Data |
|-----------|---------------------------------------|------|
|           | 0                                     | 110  |
| o bloo    | 1                                     | 120  |
| e bloc    | 2                                     | 133  |
| 0         | $\begin{array}{c} 2 \\ 3 \end{array}$ | 233  |
|           | 4                                     | 36   |
|           | 5                                     | 23   |
|           | 6                                     | 615  |
| ata       | 7                                     | 712  |
| 86        | 8                                     | 3    |
| 23        | 9                                     | 300  |
| 35        | 10                                    | 62   |
| 12        | 11                                    | 99   |
|           | 12                                    | 234  |
|           | 13                                    | 912  |
|           | 14                                    | 0    |
|           | 15                                    | 10   |
|           |                                       |      |



| Requested mem addr | Word addr | Hit/miss | Cache block |
|--------------------|-----------|----------|-------------|
| 01101 00           | 011 0 1   | miss     | 0           |

| lw | R3←mem[5]                |       |     |
|----|--------------------------|-------|-----|
| lw | R7←mem[11]               | R0    | 20  |
| SW | R6→mem[11]               | R1    | 23  |
| SW | $R5 \rightarrow mem[10]$ | 1 × 1 | 20  |
| SW | R7→mem[6]                | R2    | 36  |
| sw | R0→mem[13]               | R3    | 23  |
|    |                          | R4    | 87  |
|    |                          | R5    | 62  |
|    |                          | R6    | 99  |
|    |                          | R7    | 135 |
|    |                          |       |     |



**Word Addr** 

Data



Replace directly

| Requested mem addr | Word addr | Hit/miss | Cache block |
|--------------------|-----------|----------|-------------|
| 01101 00           | 011 0 1   | hit      | 0           |

| lw | R3 <b>←</b> mem[5] |    |     |
|----|--------------------|----|-----|
| lw | R7←mem[11]         | R0 | 20_ |
| SW | R6→mem[11]         | R1 | 23  |
| SW | R5→mem[10]         |    |     |
| SW | R7→mem[6]          | R2 | 36  |
| SW | R0→mem[13]         | R3 | 23  |
|    |                    | R4 | 87  |
|    |                    | R5 | 62  |
|    |                    | R6 | 99  |
|    |                    | R7 | 135 |
|    |                    |    |     |



Data

**Word Addr** 



# Write Through/Back Sequences

- Write back sequence
  - Two steps:
    - 1. check match,
    - 2. write data
  - Otherwise, will destroy the mismatch block, and there is no backup copy
  - May use write buffer
    - Writing buffer and checking match simultaneously



# Write Through/Back Sequences

- Write through sequence
  - Write data
  - Check for match
- Simultaneously in one step
- Mismatch doesn't matter
  - Because the mismatch block to be replaced anyway
- For hit, saves a step, less time for write through



## Write Allocation on Miss

- Ways of cache handlings for write-through
  - Write Allocate
    - Allocate cache block on miss by fetching corresponding memory block
    - Update cache block
    - Update memory block
  - No Write Allocate
    - Write around: write directly to memory
    - Then fetch from memory to cache
- For write-back
  - Usually fetch the block (write allocate)



# Write Through with no Write Allocation





Source: Wikipedia.org

# Write Back with Write Allocation



Source: Wikipedia.org



## **Example: Intrinsity FastMATH**

- Intrinsity
  - Fabless microprocessor company
  - Acquired by Apple in 2010
- FastMATH Embedded MIPS processor
  - 12-stage pipeline
  - Instruction and data access on each cycle
    - Split cache: separate I-cache and D-cache
    - Each 16KB: 256 blocks × 16 words/block
    - D-cache: write-through or write-back
- SPEC2000 miss rates
  - I-cache: 0.4%
  - D-cache: 11.4%



## **Example: Intrinsity FastMATH**





## **Measuring Cache Performance**

- Components of CPU time
  - Program execution cycles
    - Include cache hit time
  - Memory stall (miss) cycles
    - Mainly from cache misses
- With simplified assumptions:



## Cache Performance Example

#### Given

- I-cache miss rate = 2% (2 misses per 100 instructions)
- D-cache miss rate = 4% (4 misses per 100 memory access instructions)
- Miss penalty = 100 cycles
- Base CPI (ideal cache) = 2
- Load & stores are 36% of instructions
- Miss cycles per instruction



- I-cache:  $100\% \times 2\% \times 100 = 2$
- D-cache:  $36\% \times 4\% \times 100 = 1.44$
- Total CPI = base CPI + Miss (stall) cycles per instruction
  - Actual CPI = 2 + 2 + 1.44 = 5.44
  - Ideal CPU is 5.44/2 =2.72 times faster



## **Average Memory Access Time**

- Hit time is important to performance
- Average memory access time (AMAT)
  - AMAT = Hit time + Miss rate × Miss penalty
- Example
  - CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20 cycles, I-cache miss rate = 5%
  - $\blacksquare$  AMAT = 1 + 0.05  $\times$  20 = 2ns
    - 2 cycles per instruction



## **Reducing Miss Penalty**

### Early restart

- Restart execution as soon as the requested word if available, instead of waiting for the entire block
- More effective for instruction memory because instructions are accessed sequentially
- Critical word first
  - Requires specially organized memory
  - Transfer the requested words first



## Reducing Miss Penalty by Main Memory organization

- Use DRAMs for main memory
  - Fixed width (e.g., 1 word)
  - Connected by fixed-width bus
    - Bus clock is typically slower than CPU clock
- Example cache block read
  - 1 bus cycle for address transfer
  - 15 bus cycles per DRAM access
  - 1 bus cycle per data transfer



## **Reducing Miss Penalty**

by Increasing Memory Bandwidth



- For 4-word block, 1-word-wide DRAM
  - Miss penalty =  $1 + 4 \times 15 + 4 \times 1 = 65$  bus cycles
  - Bandwidth = 16 bytes / 65 cycles = 0.25 B/cycle

a. One-word-wide memory organization



## **Reducing Miss Penalty**

by Increasing Memory Bandwidth



- 2-word wide memory
  - Miss penalty =  $1 + 2 \times 15 + 2 \times 1 = 33$  bus cycles
  - Bandwidth = 16 bytes / 33 cycles = 0.48 B/cycle

b. Wider memory organization

- 4-bank interleaved memory
  - Miss penalty =  $1 + 15 + 4 \times 1 = 20$  bus cycles
  - Bandwidth = 16 bytes / 20 cycles = 0.8B/cycle



c. Interleaved memory organization

